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DETAILED ACTION 

1 . In response to the communication dated 03/23/2007, claims 46-67 are pending in this 
application. 

2. This application is a continuation of 09/768947 now patent numbered 6,658,423. 

Claim Rejections - 35 USC §101 

35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or 
any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and 
requirements of this title. 

3. Claims 48-67 are rejected under 35 U.S.C. 101 because the claimed invention is directed 
to non- statutory subject matter. 

As set forth in MPEP 21 06(II)A: 

Identify and understand Any Practical Application Asserted for the Invention The 
claimed invention as a whole must accomplish a practical application. That is, it must produce a 
"useful, concrete and tangible result." State Street, 149 F.3d at 1373, 47USPQ2d at 1601-02. The 
purpose of this requirement is to limit patent protection to inventions that possess a certain level 
of "real world" value, as opposed to subject matter that represents nothing more than an idea or 
concept, or is simply a starting point for future investigation or research (Brenner v. Manson, 383 
U.S. 519, 528-36, 148 USPQ 689, 693-96),' In re Ziegler, 992, F.2d 1 197, 1200-03, 26 USPQ2d 
1600, 1603-06 (Fed. Cir. 199334. Accordingly, a complete disclosure should contain some 
indication of the practical application for the claimed invention, i.e., why the applicant believes 
the claimed invention is useful. 
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Apart from the utility requirement of 35 U.S.C. 101, usefulness under the patent 
eligibility standard requires significant functionality to be present to satisfy the useful result 
aspect of the practical application requirement. See Arrhythmia, 958 F.2d at 1057, 22 USPQ2d at 
1036. Merely claiming nonfunctional descriptive material stored in a computer-readable medium 
does not make the invention eligible for patenting. For example, a claim directed to a word 
processing file stored on a disk may satisfy the utility requirement of 35 U.S.C. 101 since the 
information stored may have some "real world" value. However, the mere fact that the claim may 
satisfy the utility requirement of 35 U.S.C. 101 does not mean that a useful result is achieved 
under the practical application requirement. The claimed invention as a whole must produce a 
"useful, concrete and tangible" result to have a practical application. 

The claimed invention is subject to the test of State Street, 149 F.3d at 1373-74, 47 
USPQ2d at 1601-02. Specifically State Street sets forth that the claimed invention must produce 
a "useful, concrete and tangible result". The Interim Guidelines for Examination of Patent 
Applications for Patent Subject Maher Eligibility states in section IV C. 2 b. (2) (on page 21 in 
the PDF format): 

The tangible requirement does not necessarily mean that a claim must either be tied to a particular machine 
or apparatus or must operate to change articles or materials to a different state or thing. However, the 
tangible requirement does require that the claim must recite more than a §101 judicial exception, in that the 
process claim must set forth a practical application of that §101 judicial exception to produce a real-world 
result. Benson, 409 U.S. at 71-72, 175 USPQ at 676-77 (invention ineligible because had M no substantial 
practical application"). 
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Claimed invention (Claims 48 and 67) recites a machine-readable medium having stored 
thereon a plurality of records which do not satisfy the useful result aspect of the practical 
application requirement. Merely claiming nonfunctional descriptive material stored in a 
machine-readable medium does not make the invention eligible for patenting. The claims recite 
functional descriptive hash function, which is used to hash each of the elements to determine 
which one of the plurality of list that each of the elements will be contained in. However, the 
determination is performed outside of the machine-readable medium and is nothing to do with 
the functional aspect of the medium. Thus, merely reciting non-functional descriptive material 
(field and lists) by putting them into memory does not lead to a practical application. 

Claimed invention (Claim 49) recites a method for determining whether two documents 
are near-duplicates comprising for each of the two documents, generating at least two 
fingerprints and determining whether or not the two documents are near-duplicates document 
which do not provide useful and tangible results as to whether their execution accomplishes a 
practical application. This claim contains software per se which is not tangible. Moreover, the 
claim lack of practical application as to how the system would operate if it is determined that the 
fingerprint of the first of the two documents does not match with the fingerprint of the second of 
the two documents. The newly added limitation does not resolve the lacking of practical 
application because the determination of whether or not the two documents are near-duplicates is 
intended to use for different operations but not actually involving any function at all. 
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Claimed invention (Claims 50-67) recites a machine-readable medium having stored 
thereon a plurality of records which do not satisfy the useful result aspect of the practical 
application requirement. Merely claiming nonfunctional descriptive material stored in a 
machine-readable medium does not make the invention eligible for patenting. Merely reciting 
non-functional descriptive material such as field and lists by putting them into memory does not 
lead to a practical application. 



Claim Rejections - 35 USC § 102 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the 
basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public use or on 
sale in this country, more than one year prior to the date of application for patent in the United States, 
(e) the invention was described in (1) an application for patent, published under section 122(b), by another filed 
in the United States before the invention by the applicant for patent or (2) a patent granted on an application for 
patent by another filed in the United States before the invention by the applicant for patent, except that an 
international application filed under the treaty defined in section 351(a) shall have the effects for purposes of this 
subsection of an application filed in the United States only if the international application designated the United 
States and was published under Article 21(2) of such treaty in the English language. 

4. Claim 49 is rejected under 35 U.S.C. 102(e) as being anticipated by Broder (US 
6,119,124). 

Regarding claim 49, Broder discloses a method for determining whether two documents 
are near-duplicates (See col. 4, line 6 et seq.), the method comprising: 

a) for each of the two documents, generating at least two fingerprints (See col. 4, lines 
19-24, wherein unique identifications of a document can be computed as digital fingerprints 
corresponding to at least two fingerprints for each document); and 



b) determining whether or not the two documents are near-duplicate documents by 
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i) determining whether or not any one of the at least two fingerprints of a first of 
the two documents matches any one of the at least two fingerprints of a second of 
the two documents (See col. 10, lines 27-29), and 

ii) if it is determined that anyone fingerprint of the at least two fingerprints of the 
first of the two documents does match any one fingerprint of the at least two 
fingerprints of the second of the two documents, then concluding that the two 
documents are near-duplicates (See 10, lines 27-29); and 

c) using the determination of whether or not the two documents are near-duplicates in at 
least one of (A) an act of serving search results corresponding to documents, (B) an act of 
crawling documents, (C) an act of indexing documents (See col. 2, lines 27-30), and (D) an act 
of fixing a broken link to at least one of the two documents. 

5. Claims 50-54, 58-60, 61-63, and 64-66 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Johnson (U.S. Patent No. 5,850,490). 

Regarding claims 50-54, Johnson discloses a machine-readable medium having stored 
thereon a plurality of records (See Fig. 16), each of the records comprising: 

a) a first field for storing a document identifier (Document Identifier field 0001, 

Fig. 16); and 

b) a plurality of lists (Segment class 1, segment class 2, segment class 3..., Fig. 
16), each of the plurality of lists containing elements of a document identified by 
the document identifier stored in the first field ("Cowherds of the Deep" for 
example, Fig. 16, wherein, words in "Cowherds of the Deep" corresponding to 
elements of a document), 
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Johnson teaches a plurality of records organized into a table; each record reflects a 
document at Fig. 16, entry 452, wherein each record having plurality of segment classes (lists). 
Since a document may not have the same keywords with other documents, therefore, some of the 
Segment classes 1, 2, 3 include no keyword (element). Thus, Johnson teaches wherein at least 
some of the plurality of lists include different numbers of elements and wherein at least one of 
the plurality of lists include no elements as per claims 50 and 5 1 . 

Johnson teaches wherein contiguous elements in a document are not necessarily 
contiguous elements of a list as one having ordinary skill in the art would have recognized that 
table in Fig. 16 stores plurality of Segment classes for keywords in documents thus keywords in 
document are not necessarily contiguous in key fields as per claim 52. 

Johnson teaches wherein for each of the records, the number of lists is the same as each 
of records has the same number of segment classes as per claim 53. Since each document may 
have the same keywords with other documents or may not have the same keywords with other 
documents, however, document records still have the same number of segment classes (list) 
range from 1 to ... in the table; therefore, Johnson teaches wherein a number of the plurality of 
lists is independent of document size as per claim 54. 

Regarding claims 58, 61 and 64, Johnson teach discloses wherein each of the elements of 
a document is an element that has been extracted from the document (See col. 20, lines 20-3 1). 

Regarding claims 59, 62 and 65, Johnson discloses wherein each of the elements of a 
document is a predetermined one of (A) a predetermined number of words (See col. 20, lines 28- 
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3 1 , for example, "It was a dark"), (B) a predetermined number of sentences, (C) a predetermined 
number of characters, (D) a predetermined number of paragraphs, and (e) a predetermined 
number of sections. 

Regarding claims 60, 63 and 66, Johnson discloses wherein each of the elements of a 
document partially overlaps another of the elements of the document (See col. 20, lines 28-3 1) 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

6. Claim 48 is rejected under 35 U.S.C. 103(a) as being unpatentable over Johnson (U.S. 
Patent No. 5,850,490), in view of Fujiwara (U.S. Patent No. 6,381,601). 

Regarding claim 48, Johnson discloses a machine-readable medium having stored 
thereon a plurality of records (See Fig. 16), each of the records comprising: 

a) a first field for storing a document identifier (Document Identifier field 0001, 

Fig. 16); and 

b) a plurality of lists (Segment class 1, segment class 2, segment class 3..., Fig. 
16), each of the plurality of lists containing elements of a document identified by 
the document identifier stored in the first field ("Cowherds of the Deep" for 
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example, Fig. 16, wherein, words in "Cowherds of the Deep" corresponding to 

elements of a document), 
However, Johnson is silent as to teach wherein a hash function is used to hash each of the 
elements in order to determine which of the plurality of lists that each of the elements will be 
contained in. On the other hand, Fujiwara teaches using a hash function to hash each of the 
elements in order to determine which of the plurality of lists that each of the element will be 
contained in (See col. 2, lines 57-62, col. 4, lines 47-62, col. 5, line 50 to col. 6, line 10, Fujiwara 
et al. It would have been obvious to one having ordinary skill in the art at the time of the 
invention was made to use a hash function to hash each of the elements in order to determine 
which of the plurality of lists that each of the element will be contained in. The motivation 
would have been to reduce or remove duplicate elements by using hash function. 

7. Claims 48, 55-57, and 67 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Judd (U.S. Patent No. 6,360,215), in view of Fujiwara (U.S. Patent No. 6,381,601). 

Regarding claim 48, Judd discloses a machine-readable medium having stored thereon a 
plurality of records (Index 16, Fig. 1 and col. 6, lines 47-48 and lines 66-67), each of the records 
comprising: 

a) a first field for storing a document identifier (col. 7, lines 45-46, wherein 
location identifier of the current document corresponding to "document identifier" and wherein 
document index record corresponding to "record"); and 

b) a plurality of lists, each of the plurality of lists containing elements of a 
document identified by the document identifier stored in the first field (See col. 7, 
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lines 45-50, wherein columns (lists) of record contain values of properties 
(elements of document) such as document title, document summary. One having 
ordinary skill in the art would have recognized that document title or document 
summary would contain words, characters or sentences), 
Judd teaches MD5 hash function (See col. 7, line 65 to col. 8, line 9). However, Judd is silent as 
to teach wherein a hash function is used to hash each of the elements in order to determine which 
of the plurality of lists that each of the elements will be contained in. On the other hand, 
Fujiwara teaches using a hash function to hash each of the elements in order to determine which 
of the plurality of lists that each of the element will be contained in (See col. 2, lines 57-62, col. 
4, lines 47-62, col. 5, line 50 to col. 6, line 10, Fujiwara et al. It would have been obvious to one 
having ordinary skill in the art at the time of the invention was made to use a hash function to 
hash each of the elements in order to determine which of the plurality of lists that each of the 
element will be contained in. The motivation would have been to reduce or remove duplicate 
elements by using hash function. 

Regarding claim 55, Judd/Fujiwara discloses wherein each of the elements of a document 
is an element that has been extracted from the document (See col. 7, lines 48-50, Judd et aL). 

Regarding claim 56, Judd/Fujiwara discloses wherein each of the elements of a document 
is a predetermined one of (A) a predetermined number of words (See col. 7, lines 48-50, 
"document title", "document summary", Judd et al.), (B) a predetermined number of sentences 
(See col. 7, lines 48-50, "document summary", Judd et al.), (C) a predetermined number of 
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characters, (D) a predetermined number of paragraphs, and (e) a predetermined number of 
sections. 

Regarding claim 57, Judd/Fujiwara discloses wherein each of the elements of a document 
partially overlaps another of the elements of the document (See co. 7, lines 47-50, Judd et al.). 

Regarding claim 67, this claim recites similar subject matter as set forth above in claim 
48, thus is rejected under similar ground. 

8. Claims 50-54, 58-60, 61-63, and 64-66 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Bates (U.S. Patent No. 6,873,982), in view of Johnson (U.S. Patent No. 
5,850,490). 

Regarding claims 50-54, Bates discloses a machine-readable medium having stored 
thereon a plurality of records (See Fig. 4), each of the records comprising: 

a) a first field for storing a document identifier (Document Identifier field 1 02, 

Fig. 4); and 

b) a plurality of lists (Key 1 ....Key N, reference 106, Fig. 4), each of the plurality 
of lists containing element of a document identified by the document identifier 
stored in the first field (See col. 9, lines 1-3), 

Bates teaches each of the plurality of lists containing one element of a document. 
However, Bates is silent as to each of the plurality of lists containing elements of a document. 
On the other hand, Johnson teaches each of the plurality of lists containing elements of a 
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document (See Fig. 16, and corresponding text, Johnson). It would have been obvious to one 
having ordinary skill in the art at the time of the invention was made to have each of the plurality 
of lists containing more than one element of a document as suggested by Johnson because the 
difference are only found in the nonfunctional descriptive material and do not alter how the 
elements of system function. Thus, this descriptive material will not distinguish the claimed 
invention from the prior art in terms of patentability, see In re Gulack, 703 F.2d 1381, 217 USPQ 
401, 404 (Fed. Cir. 1983); In re Lowry, 32 F.3d 1579, 32 USPQ2d 1031 (Fed. Cir. 1994). 

Bates teaches a plurality of records organized into a table, each record reflects a 
document at Fig. 4 and col. 6, lines 33-45, wherein each record having plurality of keyword 
fields (lists). Since a document may not have the same keywords with other documents, 
therefore, some of the key fields 106 include no keyword (element). Thus, Bates teaches 
wherein at least some of the plurality of lists include different numbers of elements and wherein 
at least one of the plurality of lists include no elements as per claims 50 and 51. 

Bates teaches wherein contiguous elements in a document are not necessarily contiguous 
elements of a list as one having ordinary skill in the art would have recognized that table in Fig. 
4 stores plurality of key fields for keywords in documents thus keywords in document are not 
necessarily contiguous in key fields as per claim 52. 

Bates teaches wherein for each of the records, the number of lists is the same as each of 
records has the same number of key 1 to key N fields as per claim 53. Since each document may 
have the same keywords with other documents or may not have the same keywords with other 
documents, however, document records still have the same number of key fields (list) range from 
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1 to N in the table; therefore, Bate teaches wherein a number of the plurality of lists is 
independent of document size as per claim 54. 

Regarding claims 58, 61 and 64, Bates/Johnson discloses wherein each of the elements of 
a document is an element that has been extracted from the document (See col. 9, lines 2-3, Bates 
et al.) 

Regarding claims 59, 62 and 65, Bates/Johnson discloses wherein each of the elements of 
a document is a predetermined one of (A) a predetermined number of words, (B) a 
predetermined number of sentences (C) a predetermined number of characters (word contain 
predetermined character, col. 9, lines 2-3), (D) a predetermined number of paragraphs, and (e) a 
predetermined number of sections. 

Regarding claims 60, 63 and 66, Bates/Johnson discloses wherein each of the elements of 
a document partially overlaps another of the elements of the document (See Fig. 4). 

Response to Arguments 
9. Applicant's arguments filed on 48-67 about the claim rejection of the last Office Action 
have been fully considered, but they are not persuasive. 

Regarding Applicant's argument on the 101 rejection of claims 48, 50, 52, 53 and 67, 
Applicant argues that these claims are more than a mere abstraction and the claimed data 
structures are specific structural elements in memory. However, merely stored data structures in 
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memory and merely claiming nonfunctional descriptive material stored in a machine-readable 
medium does not make the invention eligible for patenting. The claims do not satisfy the useful 
result aspect of the practical application requirement as to whether their execution accomplishes 
a practical application. 

Applicant argues that Broder does not teach concluding that two documents are near- 
duplicates if any one fingerprint of one of the documents matches any one fingerprint of the 
other document where each of documents has at least two fingerprints. The Examiner 
respectfully disagrees. Broder teaches each of documents has at least two fingerprints at col. 2, 
lines 34-38 and col. 4, lines 19-24, wherein the set of shingles associated with a document are 
reduced to unique identifications, unique identifications of a document can be computed as 
digital fingerprints corresponding to at least two fingerprints for each document of the 
Applicant. And Broder teaches two documents are near-duplicates if any one fingerprint of one 
of the document matches any one fingerprint of the other document at col. 1, lines 4-7 ("like data 
objects can be identified") and col. 10, lines 27-29 ("if two documents with identical fingerprints 
are encountered"). 

Applicant argues that Johnson does not teach a plurality of lists, each of the plurality of 
lists containing elements of a document. The Examiner respectfully disagrees. Figure 16 teaches 
this limitation as each of the records (record 452) of table 450 contains a plurality of lists 
(Segment class 1 corresponds to title, segment class 2 corresponds to author, segment class 3 
corresponds to text. . .), then each of the plurality of lists containing elements of a document, for 
example, segment class 3 corresponds to text that begins with the words (words corresponds to 
elements) "It was a dark. . ."). 
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Applicant argues that one skilled in the art would not have been motivated to combine 
Johnson with Fujiwara. The Examiner respectfully disagrees. In response to applicant's 
argument that there is no suggestion to combine the references, the examiner recognizes that 
obviousness can only be established by combining or modifying the teachings of the prior art to 
produce the claimed invention where there is some teaching, suggestion, or motivation to do so 
found either in the references themselves or in the knowledge generally available to one of 
ordinary skill in the art. See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988) and In 
re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992). In this case, Fujiwara hash function 
is used to group data in records that belongs to particular column (See col. 2, lines 57-62). And 
Johnson groups the correct data (such as title belong to Segment class 1) to corresponding 
column. Thus, it would have been obvious to one having ordinary skill in the art at the time of 
the invention was made to combine Johnson with Fujiwara. 

Applicant argues that Judd fail to appreciate how any of these elements can be 
characterized as lists. Examiner respectfully disagrees. Col. 7, lines 45-50 states, "Each index 
record also contains the location identifier of the current document, and may also contain 
values of properties... such as document title, document summary", therefore location identifier, 
title and summary belongs to "the current document". 

Allowable Subject Matter 
10. Claims 46 and 47 are allowed. 

Regarding claims 1 and 1 1, the prior art of record fail to disclose or suggest the claimed 
limitation of: comparing a cluster identifier of the one candidate search results with a cluster 
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identifier of the other candidate search result to determine that the one candidate search result is 
a near-duplicate of the other candidate search results so that the one candidate search result is 
rejected, thereby a filtered set of search results including only those of the plurality of candidate 
search results that have not been rejected in conjunction with the remaining, salient claim 
provisions. 

Conclusion 

1 1 . Applicant's amendment necessitated the new ground(s) of rejection presented in this 
Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). 
Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO 
MONTHS of the mailing date of this final action and the advisory action is not mailed until after 
the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 
CFR 1 .136(a) will be calculated from the mailing date of the advisory action. In no event, 
however, will the statutory period for reply expire later than SIX MONTHS from the date of this 
final action. 
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12. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Merilyn P Nguyen whose telephone number is 571-272-4026. 
The examiner can normally be reached on M-F: 8:30 - 5:00. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Don Wong can be reached on 571-272-1834. The fax phone numbers for the 
organization where this application or proceeding is assigned are 571-273-8300 for regular 
communications and 703-746-7240 for After Final communications. 

Any inquiry of a general nature or relating to the status of this application or proceeding 
should be directed to the receptionist whose telephone number is 703-305-3900. 





MN 

June 08, 2007 
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